This PhD. Thesis concerns the study and development of hierarchical
representations for spatio-temporal visual attention modeling and understanding
in video sequences. More specifically, we propose two computational models for
visual attention. First, we present a generative probabilistic model for
context-aware visual attention modeling and understanding. Secondly, we develop
a deep network architecture for visual attention modeling, which first
estimates top-down spatio-temporal visual attention, and ultimately serves for
modeling attention in the temporal domain.
( 2
min )
In private federated learning (FL), a server aggregates differentially
private updates from a large number of clients in order to train a machine
learning model. The main challenge in this setting is balancing privacy with
both classification accuracy of the learnt model as well as the number of bits
communicated between the clients and server. Prior work has achieved a good
trade-off by designing a privacy-aware compression mechanism, called the
minimum variance unbiased (MVU) mechanism, that numerically solves an
optimization problem to determine the parameters of the mechanism. This paper
builds upon it by introducing a new interpolation procedure in the numerical
design process that allows for a far more efficient privacy analysis. The
result is the new Interpolated MVU mechanism that is more scalable, has a
better privacy-utility trade-off, and provides SOTA results on
communication-efficient private FL on a variety of datasets.
( 2
min )
As data shift or new data become available, updating clinical machine
learning models may be necessary to maintain or improve performance over time.
However, updating a model can introduce compatibility issues when the behavior
of the updated model does not align with user expectations, resulting in poor
user-model team performance. Existing compatibility measures depend on model
decision thresholds, limiting their applicability in settings where models are
used to generate rankings based on estimated risk. To address this limitation,
we propose a novel rank-based compatibility measure, $C^R$, and a new loss
function that aims to optimize discriminative performance while encouraging
good compatibility. Applied to a case study in mortality risk stratification
leveraging data from MIMIC, our approach yields more compatible models while
maintaining discriminative performance compared to existing model selection
techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval:
$0.005$, $0.035$). This work provides new tools to analyze and update risk
stratification models used in clinical care.
( 2
min )
In this research, a comparative study of four Quantum Machine Learning (QML)
models was conducted for fraud detection in finance. We proved that the Quantum
Support Vector Classifier model achieved the highest performance, with F1
scores of 0.98 for fraud and non-fraud classes. Other models like the
Variational Quantum Classifier, Estimator Quantum Neural Network (QNN), and
Sampler QNN demonstrate promising results, propelling the potential of QML
classification for financial applications. While they exhibit certain
limitations, the insights attained pave the way for future enhancements and
optimisation strategies. However, challenges exist, including the need for more
efficient Quantum algorithms and larger and more complex datasets. The article
provides solutions to overcome current limitations and contributes new insights
to the field of Quantum Machine Learning in fraud detection, with important
implications for its future development.
( 2
min )
Stroke is a significant cause of mortality and morbidity, necessitating early
predictive strategies to minimize risks. Traditional methods for evaluating
patients, such as Acute Physiology and Chronic Health Evaluation (APACHE II,
IV) and Simplified Acute Physiology Score III (SAPS III), have limited accuracy
and interpretability. This paper proposes a novel approach: an interpretable,
attention-based transformer model for early stroke mortality prediction. This
model seeks to address the limitations of previous predictive models, providing
both interpretability (providing clear, understandable explanations of the
model) and fidelity (giving a truthful explanation of the model's dynamics from
input to output). Furthermore, the study explores and compares fidelity and
interpretability scores using Shapley values and attention-based scores to
improve model explainability. The research objectives include designing an
interpretable attention-based transformer model, evaluating its performance
compared to existing models, and providing feature importance derived from the
model.
( 2
min )
This paper presents an investigation into machine learning techniques for
violence detection in videos and their adaptation to a federated learning
context. The study includes experiments with spatio-temporal features extracted
from benchmark video datasets, comparison of different methods, and proposal of
a modified version of the "Flow-Gated" architecture called "Diff-Gated."
Additionally, various machine learning techniques, including super-convergence
and transfer learning, are explored, and a method for adapting centralized
datasets to a federated learning context is developed. The research achieves
better accuracy results compared to state-of-the-art models by training the
best violence detection model in a federated learning context.
( 2
min )
This paper demonstrates the utility of organized numerical representations of
genes in research involving flat string gene formats (i.e., FASTA/FASTQ5).
FASTA/FASTQ files have several current limitations, such as their large file
sizes, slow processing speeds for mapping and alignment, and contextual
dependencies. These challenges significantly hinder investigations and tasks
that involve finding similar sequences. The solution lies in transforming
sequences into an alternative representation that facilitates easier clustering
into similar groups compared to the raw sequences themselves. By assigning a
unique vector embedding to each short sequence, it is possible to more
efficiently cluster and improve upon compression performance for the string
representations of cDNA libraries. Furthermore, through learning alternative
coordinate vector embeddings based on the contexts of codon triplets, we can
demonstrate clustering based on amino acid properties. Finally, using this
sequence embedding method to encode barcodes and cDNA sequences, we can improve
the time complexity of the similarity search by coupling vector embeddings with
an algorithm that determines the proximity of vectors in Euclidean space; this
allows us to perform sequence similarity searches in a quicker and more modular
fashion.
( 2
min )
Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. When you translate from one language to another, you want your machine translation to be accurate, fluent, and most importantly contextual. Domain-specific and language-specific customizable terminology is a key requirement for many government and commercial organizations. Custom terminology […]
( 5
min )
Natural language processing (NLP) is the field in machine learning (ML) concerned with giving computers the ability to understand text and spoken words in the same way as human beings can. Recently, state-of-the-art architectures like the transformer architecture are used to achieve near-human performance on NLP downstream tasks like text summarization, text classification, entity recognition, […]
( 11
min )
Source: ArabianBusiness Takeaways Artificial Intelligence (AI) continues to evolve at a rapid pace, with groundbreaking strides in generative capabilities playing a critical role in defining this ever-evolving landscape. One such transformative leap is the advent of Program-Aided Language models (PAL), an innovative solution that revolutionizes how Language Learning Models (LLMs) function. This article delves into… Read More »Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance
The post Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance appeared first on Data Science Central.
( 22
min )
Learn about the challenges of data privacy and security, and the potential of smart technologies in creating efficient, livable urban environments.
The post Understanding the future of smart cities through data science appeared first on Data Science Central.
( 20
min )
This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, a premier forum for advancement, education, and adoption of the discipline of knowledge discovering and data mining. Recent and noteworthy advancements in […]
The post Microsoft at KDD 2023: Advancing health at the speed of AI appeared first on Microsoft Research.
( 12
min )
In this post, we present a cross-account observability dashboard that provides a centralized view for monitoring SageMaker user activities and resources across multiple accounts. It allows the end-users and cloud management team to efficiently monitor what ML workloads are running, view the status of these workloads, and trace back different account activities at certain points of time.
( 12
min )
Creative advertising has the potential to be revolutionized by generative AI (GenAI). You can now create a wide variation of novel images, such as product shots, by retraining a GenAI model and providing a few inputs into the model, such as textual prompts (sentences describing the scene and objects to be produced by the model). […]
( 9
min )
Background In the previous part of this blog, we explored the limitations of GPT-4. In this post, we will explore if open source models can overcome the limitations of black box models. Specifically, we will consider the use of LLama2 in this scenario. The llama 2 paper from Meta is very comprehensive. Llama 2, is… Read More »Generative AI megatrends: implications of GPT-4 drift and open source models – part two
The post Generative AI megatrends: implications of GPT-4 drift and open source models – part two appeared first on Data Science Central.
( 19
min )
As generative AI continues to sweep an increasingly digital, hyperconnected world, NVIDIA founder and CEO Jensen Huang made a thunderous return to SIGGRAPH, the world’s premier computer graphics conference. “The generative AI era is upon us, the iPhone moment if you will,” Huang told an audience of thousands Tuesday during an in-person special address in Read article >
( 9
min )
Machine learning helped Waseem Alshikh plow through textbooks in college. Now he’s putting generative AI to work, creating content for hundreds of companies. Born and raised in Syria, Alshikh spoke no English, but he was fluent in software, a talent that served him well when he arrived at college in Lebanon. “The first day they Read article >
( 6
min )
Organizations across industries are using extended reality (XR) to redesign workflows and boost productivity, whether for immersive training or collaborative design reviews. With the growing use of all-in-one (AIO) headsets, more teams have adopted and integrated XR. While easing XR use, AIO headsets have modest compute and rendering power that can limit the graphics quality Read article >
( 6
min )
Professionals, teams, creators and others can tap into the power of AI to create high-quality audio and video effects — even using standard microphones and webcams — with the help of NVIDIA Maxine. The suite of GPU-accelerated software development kits and cloud-native microservices lets users deploy AI features that enhance audio, video and augmented-reality effects Read article >
( 8
min )
AI and accelerated computing were in the spotlight at SIGGRAPH — the world’s largest gathering of computer graphics experts — as NVIDIA founder and CEO Jensen Huang announced during his keynote address updates to NVIDIA Omniverse, a platform for building and connecting 3D tools and applications, as well as acceleration for Universal Scene Description (known as OpenUSD), the open and extensible ecosystem for 3D worlds.
( 10
min )
Picture this: Creators can quickly create and customize 3D scene backgrounds with the help of generative AI, thanks to cutting-edge tools from Shutterstock. The visual-content provider is building services using NVIDIA Picasso — a cloud-based foundry for developing generative AI models for visual design. The work incorporates Picasso’s latest feature — announced today during NVIDIA Read article >
( 6
min )
NVIDIA researchers are taking the stage at SIGGRAPH, the world’s largest computer graphics conference, to demonstrate a generative AI workflow that helps artists rapidly create and iterate on materials for 3D scenes. The research demo, which will be presented today at the show’s Real-Time Live event, showcases how artists can use text or image prompts Read article >
( 6
min )
DENZA, the luxury EV brand joint venture between BYD and Mercedes-Benz, has collaborated with marketing and communications giant WPP and NVIDIA Omniverse Cloud to build and deploy its next generation of car configurators, NVIDIA founder and CEO Jensen Huang announced at SIGGRAPH. WPP is using Omniverse Cloud — a platform for developing, deploying and managing Read article >
( 5
min )
Artificial intelligence (AI) adoption is accelerating across industries and use cases. Recent scientific breakthroughs in deep learning (DL), large language models (LLMs), and generative AI is allowing customers to use advanced state-of-the-art solutions with almost human-like performance. These complex models often require hardware acceleration because it enables not only faster training but also faster inference […]
( 13
min )
Microsoft Azure users can now turn to the latest NVIDIA accelerated computing technology to train and deploy their generative AI applications. Available today, the Microsoft Azure ND H100 v5 VMs using NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — enables scaling generative AI, high performance computing (HPC) and other applications with a Read article >
( 5
min )
The video gaming industry has an estimated user base of over 3 billion worldwide1. It consists of massive amounts of players virtually interacting with each other every single day. Unfortunately, as in the real world, not all players communicate appropriately and respectfully. In an effort to create and maintain a socially responsible gaming environment, AWS […]
( 13
min )
Predictions from the OncoNPC model could enable doctors to choose targeted treatments for difficult-to-treat tumors.
( 9
min )
It’s incredible how many organizations utilize Generative AI (GenAI) and Large Language Models (LLMs) to enhance their information assembly, integration, and application abilities. These GenAI technologies have been applied in various areas, from drafting legal documents and resolving service issues to coding software applications and (er, um) writing blog posts. The potential uses of GenAI… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part I
The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part I appeared first on Data Science Central.
( 23
min )
We introduce OpenFlamingo, a family of autoregressive vision-language models
ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce
an open-source replication of DeepMind's Flamingo models. On seven
vision-language datasets, OpenFlamingo models average between 80 - 89% of
corresponding Flamingo performance. This technical report describes our models,
training data, hyperparameters, and evaluation suite. We share our models and
code at https://github.com/mlfoundations/open_flamingo.
( 2
min )
Integrating knowledge across different domains is an essential feature of
human learning. Learning paradigms such as transfer learning, meta learning,
and multi-task learning reflect the human learning process by exploiting the
prior knowledge for new tasks, encouraging faster learning and good
generalization for new tasks. This article gives a detailed view of these
learning paradigms and their comparative analysis. The weakness of one learning
algorithm turns out to be a strength of another, and thus merging them is a
prevalent trait in the literature. There are numerous research papers that
focus on each of these learning paradigms separately and provide a
comprehensive overview of them. However, this article provides a review of
research studies that combine (two of) these learning algorithms. This survey
describes how these techniques are combined to solve problems in many different
fields of study, including computer vision, natural language processing,
hyperspectral imaging, and many more, in supervised setting only. As a result,
the global generic learning network an amalgamation of meta learning, transfer
learning, and multi-task learning is introduced here, along with some open
research questions and future research directions in the multi-task setting.
( 3
min )
Fine-tuning language models in a downstream task is the standard approach for
many state-of-the-art methodologies in the field of NLP. However, when the
distribution between the source task and target task drifts, \textit{e.g.},
conversational environments, these gains tend to be diminished. This article
proposes a sequence of pre-training steps (a curriculum) guided by "data
hacking" and grammar analysis that allows further gradual adaptation between
pre-training distributions. In our experiments, we acquire a considerable
improvement from our method compared to other known pre-training approaches for
the MultiWoZ task.
( 2
min )
ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to
predict and generate of metal-organic frameworks (MOFs). By leveraging a
large-scale language model (gpt-3.5-turbo), ChatMOF extracts key details from
textual inputs and delivers appropriate responses, thus eliminating the
necessity for rigid structured queries. The system is comprised of three core
components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust
pipeline that manages a variety of tasks, including data retrieval, property
prediction, and structure generation. The study further explores the merits and
constraints of using large language models (LLMs) AI system in material
sciences using and showcases its transformative potential for future
advancements.
( 2
min )
Nowadays, autonomous cars are gaining traction due to their numerous
potential applications on battlefields and in resolving a variety of other
real-world challenges. The main goal of our project is to build an autonomous
system using DeepRacer which will follow a specific person (for our project, a
soldier) when they will be moving in any direction. Two main components to
accomplish this project is an optimized Single-Shot Multibox Detection (SSD)
object detection model and a Reinforcement Learning (RL) model. We accomplished
the task using SSD Lite instead of SSD and at the end, compared the results
among SSD, SSD with Neural Computing Stick (NCS), and SSD Lite. Experimental
results show that SSD Lite gives better performance among these three
techniques and exhibits a considerable boost in inference speed (~2-3 times)
without compromising accuracy.
( 2
min )
Continual learning seeks to enable deep learners to train on a series of
tasks of unknown length without suffering from the catastrophic forgetting of
previous tasks. One effective solution is replay, which involves storing few
previous experiences in memory and replaying them when learning the current
task. However, there is still room for improvement when it comes to selecting
the most informative samples for storage and determining the optimal number of
samples to be stored. This study aims to address these issues with a novel
comparison of the commonly used reservoir sampling to various alternative
population strategies and providing a novel detailed analysis of how to find
the optimal number of stored samples.
( 2
min )
This paper introduces two randomized preconditioning techniques for robustly
solving kernel ridge regression (KRR) problems with a medium to large number of
data points ($10^4 \leq N \leq 10^7$). The first method, RPCholesky
preconditioning, is capable of accurately solving the full-data KRR problem in
$O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of
the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers
an accurate solution to a restricted version of the KRR problem involving $k
\ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations.
The proposed methods solve a broad range of KRR problems and overcome the
failure modes of previous KRR preconditioners, making them ideal for practical
applications.
( 2
min )
One of the key objects of binary classification is the regression function,
i.e., the conditional expectation of the class labels given the inputs. With
the regression function not only a Bayes optimal classifier can be defined, but
it also encodes the corresponding misclassification probabilities. The paper
presents a resampling framework to construct exact, distribution-free and
non-asymptotically guaranteed confidence regions for the true regression
function for any user-chosen confidence level. Then, specific algorithms are
suggested to demonstrate the framework. It is proved that the constructed
confidence regions are strongly consistent, that is, any false model is
excluded in the long run with probability one. The exclusion is quantified with
probably approximately correct type bounds, as well. Finally, the algorithms
are validated via numerical experiments, and the methods are compared to
approximate asymptotic confidence ellipsoids.
( 2
min )
Bayes estimators are well known to provide a means to incorporate prior
knowledge that can be expressed in terms of a single prior distribution.
However, when this knowledge is too vague to express with a single prior, an
alternative approach is needed. Gamma-minimax estimators provide such an
approach. These estimators minimize the worst-case Bayes risk over a set
$\Gamma$ of prior distributions that are compatible with the available
knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In
this work, we define Gamma-minimax estimators for general models and propose
adversarial meta-learning algorithms to compute them when the set of prior
distributions is constrained by generalized moments. Accompanying convergence
guarantees are also provided. We also introduce a neural network class that
provides a rich, but finite-dimensional, class of estimators from which a
Gamma-minimax estimator can be selected. We illustrate our method in two
settings, namely entropy estimation and a prediction problem that arises in
biodiversity studies.
( 2
min )
Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of […]
( 10
min )
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]
( 13
min )
This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the second post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their Salesforce data securely […]
( 13
min )
This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. We’re excited to announce Amazon SageMaker and Salesforce Data Cloud integration. With this capability, businesses can access their Salesforce data securely with a zero-copy approach using SageMaker and use SageMaker tools to build, train, and deploy AI models. The inference endpoints are […]
( 7
min )
In this two part discussion, we will discuss two related generative AI megatrends Backgroumd A recent paper How Is ChatGPT’s Behavior Changing over Time? from Stanford University and UC Berkeley claims that the performance of GPT-4 has drifted over time. To make this claim, specific tasks were evaluated (ex: accuracy of maths) and the results… Read More »Generative AI megatrends: implications of GPT-4 drift and open source models – part one
The post Generative AI megatrends: implications of GPT-4 drift and open source models – part one appeared first on Data Science Central.
( 19
min )
AI Weirdness: the strange side of machine learning
( 2
min )
One pandemic and one generative AI revolution later, NVIDIA founder and CEO Jensen Huang returns to the SIGGRAPH stage next week to deliver a live keynote at the world’s largest professional graphics conference. The address, slated for Tuesday, Aug. 8, at 8 a.m. PT in Los Angeles, will feature an exclusive look at some of Read article >
( 4
min )
Data classification, extraction, and analysis can be challenging for organizations that deal with volumes of documents. Traditional document processing solutions are manual, expensive, error prone, and difficult to scale. AWS intelligent document processing (IDP), with AI services such as Amazon Textract, allows you to take advantage of industry-leading machine learning (ML) technology to quickly and […]
( 10
min )
Amazon SageMaker Canvas is a visual interface that enables business analysts to generate accurate machine learning (ML) predictions on their own, without requiring any ML experience or having to write a single line of code. SageMaker Canvas’s intuitive user interface lets business analysts browse and access disparate data sources in the cloud or on premises, […]
( 5
min )
Computer vision (CV) is one of the most common applications of machine learning (ML) and deep learning. Use cases range from self-driving cars, content moderation on social media platforms, cancer detection, and automated defect detection. Amazon Rekognition is a fully managed service that can perform CV tasks like object detection, video segment detection, content moderation, […]
( 11
min )
Introduction Data Science is a vast field that incorporates several processes. From problem definition to data collection and data cleaning to data visualization, a lot of things are included in the entire data science project development process. Data Scientists are especially responsible for these tasks. They are expert professionals who are well-versed with various data… Read More »How can Data Scientists use ChatGPT for developing Machine Learning Models?
The post How can Data Scientists use ChatGPT for developing Machine Learning Models? appeared first on Data Science Central.
( 20
min )
Goran Vuksic is the brain behind a project to build a real-world pit droid, a type of Star Wars bot that repairs and maintains podracers which zoom across the much-loved film series. The edge AI Jedi used an NVIDIA Jetson Orin Nano Developer Kit as the brain of the droid itself. The devkit enables the Read article >
( 6
min )
To grow and succeed, organizations must continuously focus on technical skills development, especially in rapidly advancing areas of technology, such as generative AI and the creation of 3D virtual worlds. NVIDIA Training, which equips teams with skills for the age of AI, high performance computing and industrial digitalization, is announcing new courses that cover these Read article >
( 6
min )
The Ultimate upgrade is complete — GeForce NOW Ultimate performance is now streaming all throughout North America and Europe, delivering RTX 4080-class power for gamers across these regions. Celebrate this month with 41 new games, on top of the full release of Baldur’s Gate 3 and the first Bethesda titles coming to the cloud as Read article >
( 8
min )
Researcher Jina Suh and manager Shamsi Iqbal are longtime collaborators. Learn how their history of working together and their unique perspectives are informing their development of tools to support decision-making for organizational leaders.
The post Collaborators: Data-driven decision-making with Jina Suh and Shamsi Iqbal appeared first on Microsoft Research.
( 32
min )
SageMaker Distribution is a pre-built Docker image containing many popular packages for machine learning (ML), data science, and data visualization. This includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab. In addition to this, SageMaker Distribution supports conda, micromamba, and pip as Python […]
( 6
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra supports a variety of document […]
( 13
min )
In today’s rapidly evolving healthcare landscape, doctors are faced with vast amounts of clinical data from various sources, such as caregiver notes, electronic health records, and imaging reports. This wealth of information, while essential for patient care, can also be overwhelming and time-consuming for medical professionals to sift through and analyze. Efficiently summarizing and extracting […]
( 13
min )
Advertising agencies can use generative AI and text-to-image foundation models to create innovative ad creatives and content. In this post, we demonstrate how you can generate new images from existing base images using Amazon SageMaker, a fully managed service to build, train, and deploy ML models for at scale. With this solution, businesses large and […]
( 8
min )
Principal NVIDIA artist and 3D expert Michael Johnson creates highly detailed art that’s both technically impressive and emotionally resonant.
( 6
min )
NVIDIA joined Pixar, Adobe, Apple and Autodesk today to found the Alliance for OpenUSD, a major leap toward unlocking the next era of 3D graphics, design and simulation. The group will standardize and extend OpenUSD, the open-source Universal Scene Description framework that’s the foundation of interoperable 3D applications and projects ranging from visual effects to Read article >
( 6
min )
No content preview
( 2
min )
Drug development is a complex and long process that involves screening thousands of drug candidates and using computational or experimental methods to evaluate leads. According to McKinsey, a single drug can take 10 years and cost an average of $2.6 billion to go through disease target identification, drug screening, drug-target validation, and eventual commercial launch. […]
( 15
min )
If you are a business analyst, understanding customer behavior is probably one of the most important things you care about. Understanding the reasons and mechanisms behind customer purchase decisions can facilitate revenue growth. However, the loss of customers (commonly referred to as customer churn) always poses a risk. Gaining insights into why customers leave can […]
( 14
min )
In an age where efficiency is king, manufacturing firms are in a constant race to outshine their competition. Imagine if you could boost productivity, slash downtime, and cut costs all at once. Sounds like a dream, right? The good news is, this isn’t a fantasy. It’s achievable through Internet of Things (IoT) solutions. IoT solutions… Read More »Increase efficiency of manufacturing operations with IoT solutions
The post Increase efficiency of manufacturing operations with IoT solutions appeared first on Data Science Central.
( 21
min )
“If you start by creating your data, then it’s like you are piling up some value or you’re creating some assets,” WordLift CEO Andrea Volpini told me in our recent FAIR Data Forecast interview. Volpini’s an advocate for adding structured data such as Schema.org to your content. That way, the content becomes logically connected and… Read More »Human-centered data networking with interpersonal knowledge graphs
The post Human-centered data networking with interpersonal knowledge graphs appeared first on Data Science Central.
( 21
min )
“PhotoGuard,” developed by MIT CSAIL researchers, prevents unauthorized image manipulation, safeguarding authenticity in the era of advanced generative models.
( 10
min )
One of the reasons that I moved back to Iowa last year was that I saw an opportunity to work with local educational institutions to create an AI Institute for organizations in middle America that either get overlooked in the AI conversation or are unsure what AI means to them. I wanted to reduce the… Read More »Introduction to “AI & Data Literacy: Empowering Citizens of Data Science”
The post Introduction to “AI & Data Literacy: Empowering Citizens of Data Science” appeared first on Data Science Central.
( 22
min )
Decoding EEG signals for imagined speech is a challenging task due to the
high-dimensional nature of the data and low signal-to-noise ratio. In recent
years, denoising diffusion probabilistic models (DDPMs) have emerged as
promising approaches for representation learning in various domains. Our study
proposes a novel method for decoding EEG signals for imagined speech using
DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E
significantly improves the accuracy of decoding EEG signals for imagined speech
compared to traditional machine learning techniques and baseline models. Our
findings suggest that DDPMs can be an effective tool for EEG signal decoding,
with potential implications for the development of brain-computer interfaces
that enable communication through imagined speech.
( 2
min )
Optimization methods are essential in solving complex problems across various
domains. In this research paper, we introduce a novel optimization method
called Gaussian Crunching Search (GCS). Inspired by the behaviour of particles
in a Gaussian distribution, GCS aims to efficiently explore the solution space
and converge towards the global optimum. We present a comprehensive analysis of
GCS, including its working mechanism, and potential applications. Through
experimental evaluations and comparisons with existing optimization methods, we
highlight the advantages and strengths of GCS. This research paper serves as a
valuable resource for researchers, practitioners, and students interested in
optimization, providing insights into the development and potential of Gaussian
Crunching Search as a new and promising approach.
( 2
min )
We prove that black-box variational inference (BBVI) with control variates,
particularly the sticking-the-landing (STL) estimator, converges at a geometric
(traditionally called "linear") rate under perfect variational family
specification. In particular, we prove a quadratic bound on the gradient
variance of the STL estimator, one which encompasses misspecified variational
families. Combined with previous works on the quadratic variance condition,
this directly implies convergence of BBVI with the use of projected stochastic
gradient descent. We also improve existing analysis on the regular closed-form
entropy gradient estimators, which enables comparison against the STL estimator
and provides explicit non-asymptotic complexity guarantees for both.
( 2
min )
This paper studies the estimation and inference of treatment histories in
panel data settings when treatments change dynamically over time.
We propose a method that allows for (i) treatments to be assigned dynamically
over time based on high-dimensional covariates, past outcomes and treatments;
(ii) outcomes and time-varying covariates to depend on treatment trajectories;
(iii) heterogeneity of treatment effects.
Our approach recursively projects potential outcomes' expectations on past
histories. It then controls the bias by balancing dynamically observable
characteristics. We study the asymptotic and numerical properties of the
estimator and illustrate the benefits of the procedure in an empirical
application.
( 2
min )
In various fields, such as traffic management, law enforcement, and parking management, license plate recognition is a crucial application of computer vision that is used to analyze license plates. In this article, we will review the Chinese City Parking Dataset (CCPD), which is one of the most widely used computer vision datasets for tasks that… Read More »Understanding license plate recognition with the CCPD computer vision datasets
The post Understanding license plate recognition with the CCPD computer vision datasets appeared first on Data Science Central.
( 20
min )
From smart factories to next-generation railway systems, developers and enterprises across the world are racing to fuel industrial digitalization opportunities at every scale. Key to this is the open-source Universal Scene Description (USD) framework, or OpenUSD, along with metaverse applications powered by AI. OpenUSD, originally developed by Pixar for large-scale feature film pipelines for animation Read article >
( 7
min )
AI is improving ways to power the world by tapping the sun and the wind, along with cutting-edge technologies. The latest episode in the I AM AI video series showcases how artificial intelligence can help optimize solar and wind farms, simulate climate and weather, enhance power grid reliability and resilience, advance carbon capture and power Read article >
( 6
min )
Get ready for Gunfire Games and Gearbox Publishing’s highly anticipated Remnant II, available for members to stream on GeForce NOW at launch. It leads eight new games coming to the cloud gaming platform. Ultimate and Priority members, make sure to grab the Guild Wars 2 rewards, available now through Thursday, Aug. 31. Visit the GeForce Read article >
( 5
min )
Managing server failures at the scale of a cloud platform is challenging. The Hyrax fail-in-place approach reduces the need for immediate repairs and creates a path toward lowering water consumption and carbon emissions in cloud datacenters.
The post A fail-in-place approach for sustainable server operations appeared first on Microsoft Research.
( 12
min )
Today we are excited to announce that Stable Diffusion XL 1.0 (SDXL 1.0) is available for customers through Amazon SageMaker JumpStart. SDXL 1.0 is the latest image generation model from Stability AI. SDXL 1.0 enhancements include native 1024-pixel image generation at a variety of aspect ratios. It’s designed for professional use, and calibrated for high-resolution […]
( 12
min )
The increase in online social activities such as social networking or online gaming is often riddled with hostile or aggressive behavior that can lead to unsolicited manifestations of hate speech, cyberbullying, or harassment. For example, many online gaming communities offer voice chat functionality to facilitate communication among their users. Although voice chat often supports friendly […]
( 8
min )
Generative AI models have been experiencing rapid growth in recent months due to its impressive capabilities in creating realistic text, images, code, and audio. Among these models, Stable Diffusion models stand out for their unique strength in creating high-quality images based on text prompts. Stable Diffusion can generate a wide variety of high-quality images, including […]
( 12
min )
Breakthroughs in artificial intelligence (AI) and machine learning (ML) have been in the headlines for months—and for good reason. The emerging and evolving capabilities of this technology promises new business opportunities for customer across all sectors and industries. But the speed of this revolution has made it harder for organizations and consumers to assess what […]
( 6
min )
Generative AI Foundations on AWS is a new technical deep dive course that gives you the conceptual fundamentals, practical advice, and hands-on guidance to pre-train, fine-tune, and deploy state-of-the-art foundation models on AWS and beyond. Developed by AWS generative AI worldwide foundations lead Emily Webber, this free hands-on course and the supporting GitHub source code […]
( 6
min )
As a pioneer in artificial intelligence and machine learning, AWS is committed to developing and deploying generative AI responsibly As one of the most transformational innovations of our time, generative AI continues to capture the world’s imagination, and we remain as committed as ever to harnessing it responsibly. With a team of dedicated responsible AI […]
( 5
min )
AWS users can now access the leading performance demonstrated in industry benchmarks of AI training and inference. The cloud giant officially switched on a new Amazon EC2 P5 instance powered by NVIDIA H100 Tensor Core GPUs. The service lets users scale generative AI, high performance computing (HPC) and other applications with a click from a Read article >
( 6
min )
The world increasingly runs on code. Accelerating the work of those who create that code will boost their productivity — and that’s just what AI startup Codeium, a member of NVIDIA’s Inception program for startups, aims to do. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz interviewed Codeium founder and CEO Varun Read article >
( 5
min )
Researchers develop a machine-learning technique that can efficiently learn to control a robot, leading to better performance with fewer data.
( 10
min )
Welcome to the exciting world of digital marketing! In this blog, we’ll delve into this thrilling frontier where optimization meets automation and Artificial Intelligence is at the center. No longer must manual labor and guesswork play an essential part in developing effective marketing strategies; with AI’s capabilities now at their disposal, marketers with digital presence… Read More »From automation to optimization: How AI is revolutionizing digital marketing campaigns
The post From automation to optimization: How AI is revolutionizing digital marketing campaigns appeared first on Data Science Central.
( 24
min )
With recent advancements in generative AI, there are lot of discussions happening on how to use generative AI across different industries to solve specific business problems. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is all backed by very large models […]
( 9
min )
NVIDIA DGX Cloud — which delivers tools that can turn nearly any company into an AI company — is now broadly available, with thousands of NVIDIA GPUs online on Oracle Cloud Infrastructure, as well as NVIDIA infrastructure located in the U.S. and U.K. Unveiled at NVIDIA’s GTC conference in March, DGX Cloud is an AI Read article >
( 5
min )
We’re gonna need a bigger boat this week In the NVIDIA Studio as Alessandro Mastronardi, senior artist and programmer at BBC Studios, shares heart-stopping shark videos and renders.
( 7
min )
This blog post was co-authored, and includes an introduction, by Zilong Bai, senior natural language processing engineer at Patsnap. You’re likely familiar with the autocomplete suggestion feature when you search for something on Google or Amazon. Although the search terms in these scenarios are pretty common keywords or expressions that we use in daily life, […]
( 9
min )
When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. For production workloads requiring high throughput and low latency, the selection of the Amazon Elastic Compute Cloud (EC2) instance, model serving stack, and deployment architecture is very important. Inefficient architecture can lead to […]
( 15
min )
In this blog, I will now focus on generative AI megatrends. By that, I mean, trends and underlying trends that could be big in the future – focusing on the technology of LLM but also the wider impact of LLMs on the economy and society. I will hence identify and follow some key trends –… Read More »Generative AI megatrends: Are companies using the excuse of AI to get rid of jobs?
The post Generative AI megatrends: Are companies using the excuse of AI to get rid of jobs? appeared first on Data Science Central.
( 19
min )
Partial differential equations (PDEs) are important tools to model physical
systems and including them into machine learning models is an important way of
incorporating physical knowledge. Given any system of linear PDEs with constant
coefficients, we propose a family of Gaussian process (GP) priors, which we
call EPGP, such that all realizations are exact solutions of this system. We
apply the Ehrenpreis-Palamodov fundamental principle, which works as a
non-linear Fourier transform, to construct GP kernels mirroring standard
spectral methods for GPs. Our approach can infer probable solutions of linear
PDE systems from any data such as noisy measurements, or pointwise defined
initial and boundary conditions. Constructing EPGP-priors is algorithmic,
generally applicable, and comes with a sparse version (S-EPGP) that learns the
relevant spectral frequencies and works better for big data sets. We
demonstrate our approach on three families of systems of PDEs, the heat
equation, wave equation, and Maxwell's equations, where we improve upon the
state of the art in computation time and precision, in some experiments by
several orders of magnitude.
( 3
min )
Labeling of multivariate biomedical time series data is a laborious and
expensive process. Self-supervised contrastive learning alleviates the need for
large, labeled datasets through pretraining on unlabeled data. However, for
multivariate time series data, the set of input channels often varies between
applications, and most existing work does not allow for transfer between
datasets with different sets of input channels. We propose learning one encoder
to operate on all input channels individually. We then use a message passing
neural network to extract a single representation across channels. We
demonstrate the potential of this method by pretraining our model on a dataset
with six EEG channels and then fine-tuning it on a dataset with two different
EEG channels. We compare models with and without the message passing neural
network across different contrastive loss functions. We show that our method,
combined with the TS2Vec loss, outperforms all other methods in most settings.
( 2
min )
Centrality metrics are vital for network analysis, but selecting the most
appropriate measures for specific applications remains challenging among the
400+ proposed indices. Existing approaches -- model-based, data-driven, and
axiomatic -- have limitations. To address this, we introduce the culling
method, leveraging expert preferences regarding centrality behavior on simple
graphs. It involves forming a set of candidate measures, generating a list of
as small graphs as possible needed to ``separate'' measures from each other,
constructing a decision-tree survey, and identifying the measure consistent
with expert responses. We apply this method to a diverse set of 40
centralities, including new kernel-based measures, and combine it with the
axiomatic approach. Remarkably, only 13 small 1-trees suffice to separate all
40 measures, among which there are pairs of close ones. The culling method
offers a low-cost solution in terms of labor and time, complements existing
methods for measure selection, and reveals important peculiarities of
centrality measures.
( 2
min )
We consider the problem of learning from data corrupted by
underrepresentation bias, where positive examples are filtered from the data at
different, unknown rates for a fixed number of sensitive groups. We show that
with a small amount of unbiased data, we can efficiently estimate the
group-wise drop-out parameters, even in settings where intersectional group
membership makes learning each intersectional rate computationally infeasible.
Using this estimate for the group-wise drop-out rate, we construct a
re-weighting scheme that allows us to approximate the loss of any hypothesis on
the true distribution, even if we only observe the empirical error on a biased
sample. Finally, we present an algorithm encapsulating this learning and
re-weighting process, and we provide strong PAC-style guarantees that, with
high probability, our estimate of the risk of the hypothesis over the true
distribution will be arbitrarily close to the true risk.
( 2
min )
Adversarial examples are inputs to machine learning models that an attacker
has intentionally designed to confuse the model into making a mistake. Such
examples pose a serious threat to the applicability of machine-learning-based
systems, especially in life- and safety-critical domains. To address this
problem, the area of adversarial robustness investigates mechanisms behind
adversarial attacks and defenses against these attacks. This survey reviews a
particular subset of this literature that focuses on investigating properties
of training data in the context of model robustness under evasion attacks. It
first summarizes the main properties of data leading to adversarial
vulnerability. It then discusses guidelines and techniques for improving
adversarial robustness by enhancing the data representation and learning
procedures, as well as techniques for estimating robustness guarantees given
particular data. Finally, it discusses gaps of knowledge and promising future
research directions in this area.
( 2
min )
Robust reinforcement learning (RL) aims to find a policy that optimizes the
worst-case performance in the face of uncertainties. In this paper, we focus on
action robust RL with the probabilistic policy execution uncertainty, in which,
instead of always carrying out the action specified by the policy, the agent
will take the action specified by the policy with probability $1-\rho$ and an
alternative adversarial action with probability $\rho$. We establish the
existence of an optimal policy on the action robust MDPs with probabilistic
policy execution uncertainty and provide the action robust Bellman optimality
equation for its solution. Furthermore, we develop Action Robust Reinforcement
Learning with Certificates (ARRLC) algorithm that achieves minimax optimal
regret and sample complexity. Furthermore, we conduct numerical experiments to
validate our approach's robustness, demonstrating that ARRLC outperforms
non-robust RL algorithms and converges faster than the robust TD algorithm in
the presence of action perturbations.
( 2
min )
Attention is the core mechanism of today's most used architectures for
natural language processing and has been analyzed from many perspectives,
including its effectiveness for machine translation-related tasks. Among these
studies, attention resulted to be a useful source of information to get
insights about word alignment also when the input text is substituted with
audio segments, as in the case of the speech translation (ST) task. In this
paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that
exploits the attention information to generate source-target alignments that
guide the model during inference. Through experiments on the 8 language pairs
of MuST-C v1.0, we show that AlignAtt outperforms previous state-of-the-art
SimulST policies applied to offline-trained models with gains in terms of BLEU
of 2 points and latency reductions ranging from 0.5s to 0.8s across the 8
languages.
( 2
min )
We propose simple nonparametric estimators for mediated and time-varying dose
response curves based on kernel ridge regression. By embedding Pearl's
mediation formula and Robins' g-formula with kernels, we allow treatments,
mediators, and covariates to be continuous in general spaces, and also allow
for nonlinear treatment-confounder feedback. Our key innovation is a
reproducing kernel Hilbert space technique called sequential kernel embedding,
which we use to construct simple estimators for complex causal estimands. Our
estimators preserve the generality of classic identification while also
achieving nonasymptotic uniform rates. In nonlinear simulations with many
covariates, we demonstrate strong performance. We estimate mediated and
time-varying dose response curves of the US Job Corps, and clean data that may
serve as a benchmark in future work. We extend our results to mediated and
time-varying treatment effects and counterfactual distributions, verifying
semiparametric efficiency and weak convergence.
( 2
min )
The drastic growth of electric vehicles and photovoltaics can introduce new
challenges, such as electrical current congestion and voltage limit violations
due to peak load demands. These issues can be mitigated by controlling the
operation of electric vehicles i.e., smart charging. Centralized smart charging
solutions have already been proposed in the literature. But such solutions may
lack scalability and suffer from inherent drawbacks of centralization, such as
a single point of failure, and data privacy concerns. Decentralization can help
tackle these challenges. In this paper, a fully decentralized smart charging
system is proposed using the philosophy of adaptive multi-agent systems. The
proposed system utilizes multi-armed bandit learning to handle uncertainties in
the system. The presented system is decentralized, scalable, real-time,
model-free, and takes fairness among different players into account. A detailed
case study is also presented for performance evaluation.
( 2
min )
Time series anomaly detection is crucial for industrial monitoring services
that handle a large volume of data, aiming to ensure reliability and optimize
system performance. Existing methods often require extensive labeled resources
and manual parameter selection, highlighting the need for automation. This
paper proposes a comprehensive framework for automatic parameter optimization
in time series anomaly detection models. The framework introduces three
optimization targets: prediction score, shape score, and sensitivity score,
which can be easily adapted to different model backbones without prior
knowledge or manual labeling efforts. The proposed framework has been
successfully applied online for over six months, serving more than 50,000 time
series every minute. It simplifies the user's experience by requiring only an
expected sensitive value, offering a user-friendly interface, and achieving
desired detection results. Extensive evaluations conducted on public datasets
and comparison with other methods further confirm the effectiveness of the
proposed framework.
( 2
min )
Performance of a pre-trained semantic segmentation model is likely to
substantially decrease on data from a new domain. We show a pre-trained model
can be adapted to unlabelled target domain data by calculating soft-label
prototypes under the domain shift and making predictions according to the
prototype closest to the vector with predicted class probabilities. The
proposed adaptation procedure is fast, comes almost for free in terms of
computational resources and leads to considerable performance improvements. We
demonstrate the benefits of such label calibration on the highly-practical
synthetic-to-real semantic segmentation problem.
( 2
min )
Granger causality (GC) is often considered not an actual form of causality.
Still, it is arguably the most widely used method to assess the predictability
of a time series from another one. Granger causality has been widely used in
many applied disciplines, from neuroscience and econometrics to Earth sciences.
We revisit GC under a graphical perspective of state-space models. For that, we
use GraphEM, a recently presented expectation-maximisation algorithm for
estimating the linear matrix operator in the state equation of a
linear-Gaussian state-space model. Lasso regularisation is included in the
M-step, which is solved using a proximal splitting Douglas-Rachford algorithm.
Experiments in toy examples and challenging climate problems illustrate the
benefits of the proposed model and inference technique over standard Granger
causality methods.
( 2
min )
We present FACADE, a novel probabilistic and geometric framework designed for
unsupervised mechanistic anomaly detection in deep neural networks. Its primary
goal is advancing the understanding and mitigation of adversarial attacks.
FACADE aims to generate probabilistic distributions over circuits, which
provide critical insights to their contribution to changes in the manifold
properties of pseudo-classes, or high-dimensional modes in activation space,
yielding a powerful tool for uncovering and combating adversarial attacks. Our
approach seeks to improve model robustness, enhance scalable model oversight,
and demonstrates promising applications in real-world deployment settings.
( 2
min )
In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows
method to final states containing multiple neutrinos. The architecture can
natively scale for all combinations of object types and multiplicities in the
final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton
events, the momenta of both neutrinos and correlations between them are
reconstructed more accurately than when using the most popular standard
analytical techniques, and solutions are found for all events. Inference time
is significantly faster than competing methods, and can be reduced further by
evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to
$t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded
distributions is much closer to the limit of performance set by perfect
neutrino reconstruction than standard techniques. For the chosen double
differential observables $\nu^2$-Flows results in improved statistical
precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino
Weighting method and up to a factor of four in comparison to the Ellipse
approach.
( 2
min )
Data-driven optimization uses contextual information and machine learning
algorithms to find solutions to decision problems with uncertain parameters.
While a vast body of work is dedicated to interpreting machine learning models
in the classification setting, explaining decision pipelines involving learning
algorithms remains unaddressed. This lack of interpretability can block the
adoption of data-driven solutions as practitioners may not understand or trust
the recommended decisions. We bridge this gap by introducing a counterfactual
explanation methodology tailored to explain solutions to data-driven problems.
We introduce two classes of explanations and develop methods to find nearest
explanations of random forest and nearest-neighbor predictors. We demonstrate
our approach by explaining key problems in operations management such as
inventory management and routing.
( 2
min )
In this article we describe an efficient approach to guiding language model
text generation with regular expressions and context-free grammars. Our
approach adds little to no overhead to the token sequence generation process,
and makes guided generation feasible in practice. An implementation is provided
in the open source Python library Outlines.
( 2
min )
The research on code-mixed data is limited due to the unavailability of
dedicated code-mixed datasets and pre-trained language models. In this work, we
focus on the low-resource Indian language Marathi which lacks any prior work in
code-mixing. We present L3Cube-MeCorpus, a large code-mixed Marathi-English
(Mr-En) corpus with 10 million social media sentences for pretraining. We also
release L3Cube-MeBERT and MeRoBERTa, code-mixed BERT-based transformer models
pre-trained on MeCorpus. Furthermore, for benchmarking, we present three
supervised datasets MeHate, MeSent, and MeLID for downstream tasks like
code-mixed Mr-En hate speech detection, sentiment analysis, and language
identification respectively. These evaluation datasets individually consist of
manually annotated \url{~}12,000 Marathi-English code-mixed tweets. Ablations
show that the models trained on this novel corpus significantly outperform the
existing state-of-the-art BERT models. This is the first work that presents
artifacts for code-mixed Marathi research. All datasets and models are publicly
released at https://github.com/l3cube-pune/MarathiNLP .
( 2
min )
Active learning is a well-studied approach to learning formal specifications,
such as automata. In this work, we extend active specification learning by
proposing a novel framework that strategically requests a combination of
membership labels and pair-wise preferences, a popular alternative to
membership labels. The combination of pair-wise preferences and membership
labels allows for a more flexible approach to active specification learning,
which previously relied on membership labels only. We instantiate our framework
in two different domains, demonstrating the generality of our approach. Our
results suggest that learning from both modalities allows us to robustly and
conveniently identify specifications via membership and preferences.
( 2
min )
This note describes a new approach to classifying graphs that leverages graph
generative models (GGM). Assuming a GGM that defines a joint probability
distribution over graphs and their class labels, I derive classification
formulas for the probability of a class label given a graph. A new conditional
ELBO can be used to train a generative graph auto-encoder model for
discrimination. While leveraging generative models for classification has been
well explored for non-relational i.i.d. data, to our knowledge it is a novel
approach to graph classification.
( 2
min )
Flow map learning (FML), in conjunction with deep neural networks (DNNs), has
shown promises for data driven modeling of unknown dynamical systems. A
remarkable feature of FML is that it is capable of producing accurate
predictive models for partially observed systems, even when their exact
mathematical models do not exist. In this paper, we present an overview of the
FML framework, along with the important computational details for its
successful implementation. We also present a set of well defined benchmark
problems for learning unknown dynamical systems. All the numerical details of
these problems are presented, along with their FML results, to ensure that the
problems are accessible for cross-examination and the results are reproducible.
( 2
min )
Deep Learners (DLs) are the state-of-art predictive mechanism with
applications in many fields requiring complex high dimensional data processing.
Although conventional DLs get trained via gradient descent with
back-propagation, Kalman Filter (KF)-based techniques that do not need gradient
computation have been developed to approximate DLs. We propose a multi-arm
extension of a KF-based DL approximator that can mimic DL when the sample size
is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman
Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking
that becomes relevant when the training sample has an unequal-size feature set.
Our proposed technique can approximate Long Short-term Memory (LSTM) Networks
and attach uncertainty to the predictions obtained from these LSTMs with
desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate
an LSTM network trained to classify what carbohydrate substrates are digested
and utilized by a microbiome sample whose genomic sequences consist of
polysaccharide utilization loci (PULs) and their encoded genes.
( 2
min )
Source-free domain adaptation has become popular because of its practical
usefulness and no need to access source data. However, the adaptation process
still takes a considerable amount of time and is predominantly based on
optimization that relies on back-propagation. In this work we present a simple
feed-forward approach that challenges the need for back-propagation based
adaptation. Our approach is based on computing prototypes of classes under the
domain shift using a pre-trained model. It achieves strong improvements in
accuracy compared to the pre-trained model and requires only a small fraction
of time of existing domain adaptation methods.
( 2
min )
We propose simple nonparametric estimators for mediated and time-varying dose
response curves based on kernel ridge regression. By embedding Pearl's
mediation formula and Robins' g-formula with kernels, we allow treatments,
mediators, and covariates to be continuous in general spaces, and also allow
for nonlinear treatment-confounder feedback. Our key innovation is a
reproducing kernel Hilbert space technique called sequential kernel embedding,
which we use to construct simple estimators for complex causal estimands. Our
estimators preserve the generality of classic identification while also
achieving nonasymptotic uniform rates. In nonlinear simulations with many
covariates, we demonstrate strong performance. We estimate mediated and
time-varying dose response curves of the US Job Corps, and clean data that may
serve as a benchmark in future work. We extend our results to mediated and
time-varying treatment effects and counterfactual distributions, verifying
semiparametric efficiency and weak convergence.
( 2
min )
The tuning of stochastic gradient algorithms (SGAs) for optimization and
sampling is often based on heuristics and trial-and-error rather than
generalizable theory. We address this theory--practice gap by characterizing
the large-sample statistical asymptotics of SGAs via a joint
step-size--sample-size scaling limit. We show that iterate averaging with a
large fixed step size is robust to the choice of tuning parameters and
asymptotically has covariance proportional to that of the MLE sampling
distribution. We also prove a Bernstein--von Mises-like theorem to guide
tuning, including for generalized posteriors that are robust to model
misspecification. Numerical experiments validate our results and
recommendations in realistic finite-sample regimes. Our work lays the
foundation for a systematic analysis of other stochastic gradient Markov chain
Monte Carlo algorithms for a wide range of models.
( 2
min )
Rodents such as rats and mice are associated with a number of health risks and are known to spread more than 35 diseases. Identifying regions of high rodent activity can help local authorities and pest control organizations plan for interventions effectively and exterminate the rodents. In this post, we show how to monitor and visualize […]
( 7
min )
Microsoft Research is proud to be a sponsor of ICML 2023! From audio classification to privacy estimation and more, explore conference highlights in our latest blog post.
The post Microsoft at ICML 2023: Discoveries and advancements in machine learning appeared first on Microsoft Research.
( 10
min )
This is a guest post by Mario Namtao Shianti Larcher, Head of Computer Vision at Enel. Enel, which started as Italy’s national entity for electricity, is today a multinational company present in 32 countries and the first private network operator in the world with 74 million users. It is also recognized as the first renewables […]
( 8
min )
Artificial intelligence (AI) has become an important and popular topic in the technology community. As AI has evolved, we have seen different types of machine learning (ML) models emerge. One approach, known as ensemble modeling, has been rapidly gaining traction among data scientists and practitioners. In this post, we discuss what ensemble models are and […]
( 12
min )
It’s a party this GFN Thursday with several newly launched titles streaming on GeForce NOW. Revel in gaming goodness with Xenonauts 2, Viewfinder and Techtonica, among the four new games joining the cloud this week. Portal fans, stay tuned — the Portal: Prelude RTX mod will be streaming on GeForce NOW to members soon. Plus, Read article >
( 5
min )
For over a decade, Xbox has been leveraging AI to elevate gaming. Haiyan Zhang, GM of Gaming AI, explores the collaborations behind the work and the potential for generative AI to support better experiences for both players and game creators.
The post Collaborators: Gaming AI with Haiyan Zhang appeared first on Microsoft Research.
( 29
min )
Large language models (LLMs) can be used to analyze complex documents and provide summaries and answers to questions. The post Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data describes how to fine-tune an LLM using your own dataset. Once you have a solid LLM, you’ll want to expose that LLM to […]
( 7
min )
Amazon SageMaker Model Cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation […]
( 7
min )
Saildrone is making a splash in autonomous oceanic monitoring. The startup’s nautical data collection technology has tracked hurricanes up close in the North Atlantic, discovered a 3,200-foot underwater mountain in the Pacific Ocean and begun to help map the entirety of the world’s ocean floor. Based in the San Francisco Bay Area, the company develops Read article >
( 6
min )
Undetected partial discharges (PDs) are a safety critical issue in high
voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC
voltage is well-established, the analysis of PDs under DC voltage remains an
active research field. A key focus of these investigations is the
classification of different PD sources to enable subsequent sophisticated
analysis.
In this paper, we propose and analyze a neural network-based approach for
classifying PD signals caused by metallic protrusions and conductive particles
on the insulator of HVDC GIS, without relying on pulse sequence analysis
features. In contrast to previous approaches, our proposed model can
discriminate the studied PD signals obtained at negative and positive
potentials, while also generalizing to unseen operating voltage multiples.
Additionally, we compare the performance of time- and frequency-domain input
signals and explore the impact of different normalization schemes to mitigate
the influence of free-space path loss between the sensor and defect location.
( 2
min )
Determining clinically relevant physiological states from multivariate time
series data with missing values is essential for providing appropriate
treatment for acute conditions such as Traumatic Brain Injury (TBI),
respiratory failure, and heart failure. Utilizing non-temporal clustering or
data imputation and aggregation techniques may lead to loss of valuable
information and biased analyses. In our study, we apply the SLAC-Time
algorithm, an innovative self-supervision-based approach that maintains data
integrity by avoiding imputation or aggregation, offering a more useful
representation of acute patient states. By using SLAC-Time to cluster data in a
large research dataset, we identified three distinct TBI physiological states
and their specific feature profiles. We employed various clustering evaluation
metrics and incorporated input from a clinical domain expert to validate and
interpret the identified physiological states. Further, we discovered how
specific clinical events and interventions can influence patient states and
state transitions.
( 2
min )
Exfiltration of data via email is a serious cybersecurity threat for many
organizations. Detecting data exfiltration (anomaly) patterns typically
requires labeling, most often done by a human annotator, to reduce the high
number of false alarms. Active Learning (AL) is a promising approach for
labeling data efficiently, but it needs to choose an efficient order in which
cases are to be labeled, and there are uncertainties as to what scoring
procedure should be used to prioritize cases for labeling, especially when
detecting rare cases of interest is crucial. We propose an adaptive AL sampling
strategy that leverages the underlying prior data distribution, as well as
model uncertainty, to produce batches of cases to be labeled that contain
instances of rare anomalies. We show that (1) the classifier benefits from a
batch of representative and informative instances of both normal and anomalous
examples, (2) unsupervised anomaly detection plays a useful role in building
the classifier in the early stages of training when relatively little labeling
has been done thus far. Our approach to AL for anomaly detection outperformed
existing AL approaches on three highly unbalanced UCI benchmarks and on one
real-world redacted email data set.
( 2
min )
This report presents the technical details of our submission on the EGO4D
Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the
OxfordVGG team. We present WhisperX, a system for efficient speech
transcription of long-form audio with word-level time alignment, along with two
text normalisers which are publicly available. Our final submission obtained
56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the
leaderboard. All baseline codes and models are available on
https://github.com/m-bain/whisperX.
( 2
min )
Recent regulation on right-to-be-forgotten emerges tons of interest in
unlearning pre-trained machine learning models. While approximating a
straightforward yet expensive approach of retrain-from-scratch, recent machine
unlearning methods unlearn a sample by updating weights to remove its influence
on the weight parameters. In this paper, we introduce a simple yet effective
approach to remove a data influence on the deep generative model. Inspired by
works in multi-task learning, we propose to manipulate gradients to regularize
the interplay of influence among samples by projecting gradients onto the
normal plane of the gradients to be retained. Our work is agnostic to
statistics of the removal samples, outperforming existing baselines while
providing theoretical analysis for the first time in unlearning a generative
model.
( 2
min )
In this paper we present a new classification method based on Dictionary
Learning (DL). The main contribution consists of a kernel version of incoherent
DL, derived from its standard linear counterpart. We also propose an
improvement of the AK-SVD algorithm concerning the representation update. Our
algorithms are tested on several popular databases of classification problems.
( 2
min )
In this letter, we propose the use of a meta-learning based precoder
optimization framework to directly optimize the Rate-Splitting Multiple Access
(RSMA) precoders with partial Channel State Information at the Transmitter
(CSIT). By exploiting the overfitting of the compact neural network to maximize
the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need
for any other training data while minimizing the total running time. Numerical
results reveal that the meta-learning based solution achieves similar ASR
performance to conventional precoder optimization in medium-scale scenarios,
and significantly outperforms sub-optimal low complexity precoder algorithms in
the large-scale regime.
( 2
min )
When to solve math problems, most language models take a sampling strategy to
predict next word according conditional probabilities. In the math reasoning
step, it may generate wrong answer. Considering math problems are
deterministic, we propose a mixed policy exploration approach to solve math
problems with reinforcement learning. In peculiar, we propose a two level token
exploration policy: the abstract level explores next token with probability and
the second level is deterministic. Specifically, the abstract level policy will
decide whether the token is operator or operand with probability sampling,
while the second level is deterministic to select next token with the highest
score in a greedy way. We test our method on GSM8K dataset with GPT-2 model,
and demonstrate more than $2\%$ performance gain. Our implementation is
available at https://github.com/vividitytech/math_lm_rl.
( 2
min )
We study the generalization properties of batched predictors, i.e., models
tasked with predicting the mean label of a small set (or batch) of examples.
The batched prediction paradigm is particularly relevant for models deployed to
determine the quality of a group of compounds in preparation for offline
testing. By utilizing a suitable generalization of the Rademacher complexity,
we prove that batched predictors come with exponentially stronger
generalization guarantees as compared to the standard per-sample approach.
Surprisingly, the proposed bound holds independently of overparametrization.
Our theoretical insights are validated experimentally for various tasks,
architectures, and applications.
( 2
min )
High-dimensional clinical data have become invaluable resources for genetic
studies, due to their accessibility in biobank-scale datasets and the
development of high performance modeling techniques especially using deep
learning. Recent work has shown that low dimensional embeddings of these
clinical data learned by variational autoencoders (VAE) can be used for
genome-wide association studies and polygenic risk prediction. In this work, we
consider multiple unsupervised learning methods for learning disentangled
representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the
context of genetic association studies. Using spirograms from UK Biobank as a
running example, we observed improvements in the number of genome-wide
significant loci, heritability, and performance of polygenic risk scores for
asthma and chronic obstructive pulmonary disease by using FactorVAE or
beta-VAE, compared to standard VAE or non-variational autoencoders. FactorVAEs
performed effectively across multiple values of the regularization
hyperparameter, while beta-VAEs were much more sensitive to the hyperparameter
values.
( 2
min )
Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect. Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use […]
( 10
min )
In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. […]
( 9
min )
Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation […]
( 6
min )
Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, […]
( 14
min )
The “Portal: Prelude RTX” gaming mod — a remastering of the popular unofficial “Portal” prequel — comes with full ray tracing, DLSS 3 and RTX IO technology for cutting-edge, AI-powered graphics that rejuvenate the legendary mod for gamers, creators, developers and others to experience it anew.
( 7
min )
A new $5+ million partnership aims to explore ways the development of artificial intelligence (AI) can support a thriving, innovative local news field, and ensure local news organizations shape the future of this emerging technology.
( 3
min )
EECS professor appointed to new professorship in the MIT Schwarzman College of Computing.
( 6
min )
With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry. Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface. You can perform all ML development […]
( 10
min )
The generalization performance of deep neural networks with regard to the
optimization algorithm is one of the major concerns in machine learning. This
performance can be affected by various factors. In this paper, we theoretically
prove that the Lipschitz constant of a loss function is an important factor to
diminish the generalization error of the output model obtained by Adam or
AdamW. The results can be used as a guideline for choosing the loss function
when the optimization algorithm is Adam or AdamW. In addition, to evaluate the
theoretical bound in a practical setting, we choose the human age estimation
problem in computer vision. For assessing the generalization better, the
training and test datasets are drawn from different distributions. Our
experimental evaluation shows that the loss function with a lower Lipschitz
constant and maximum value improves the generalization of the model trained by
Adam or AdamW.
( 2
min )
Despite the dominance and effectiveness of scaling, resulting in large
networks with hundreds of billions of parameters, the necessity to train
overparametrized models remains poorly understood, and alternative approaches
do not necessarily make it cheaper to train high-performance models. In this
paper, we explore low-rank training techniques as an alternative approach to
training large neural networks. We introduce a novel method called ReLoRA,
which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to
pre-training transformer language models with up to 350M parameters and
demonstrate comparable performance to regular neural network training.
Furthermore, we observe that the efficiency of ReLoRA increases with model
size, making it a promising approach for training multi-billion-parameter
networks efficiently. Our findings shed light on the potential of low-rank
training techniques and their implications for scaling laws.
( 2
min )
We present APAC-Net, an alternating population and agent control neural
network for solving stochastic mean field games (MFGs). Our algorithm is geared
toward high-dimensional instances of MFGs that are beyond reach with existing
solution methods. We achieve this in two steps. First, we take advantage of the
underlying variational primal-dual structure that MFGs exhibit and phrase it as
a convex-concave saddle point problem. Second, we parameterize the value and
density functions by two neural networks, respectively. By phrasing the problem
in this manner, solving the MFG can be interpreted as a special case of
training a generative adversarial network (GAN). We show the potential of our
method on up to 100-dimensional MFG problems.
( 2
min )
Federated learning (FL) has evolved as a prominent method for edge devices to
cooperatively create a unified prediction model while securing their sensitive
training data local to the device. Despite the existence of numerous research
frameworks for simulating FL algorithms, they do not facilitate comprehensive
deployment for automatic speech recognition tasks on heterogeneous edge
devices. This is where Ed-Fed, a comprehensive and generic FL framework, comes
in as a foundation for future practical FL system research. We also propose a
novel resource-aware client selection algorithm to optimise the waiting time in
the FL settings. We show that our approach can handle the straggler devices and
dynamically set the training time for the selected devices in a round. Our
evaluation has shown that the proposed approach significantly optimises waiting
time in FL compared to conventional random client selection methods.
( 2
min )
The current cut selection algorithm used in mixed-integer programming solvers
has remained largely unchanged since its creation. In this paper, we propose a
set of new cut scoring measures, cut filtering techniques, and stopping
criteria, extending the current state-of-the-art algorithm and obtaining a 4\%
performance improvement for SCIP over the MIPLIB 2017 benchmark set.
( 2
min )
Controlling nonlinear dynamical systems using machine learning allows to not
only drive systems into simple behavior like periodicity but also to more
complex arbitrary dynamics. For this, it is crucial that a machine learning
system can be trained to reproduce the target dynamics sufficiently well. On
the example of forcing a chaotic parametrization of the Lorenz system into
intermittent dynamics, we show first that classical reservoir computing excels
at this task. In a next step, we compare those results based on different
amounts of training data to an alternative setup, where next-generation
reservoir computing is used instead. It turns out that while delivering
comparable performance for usual amounts of training data, next-generation RC
significantly outperforms in situations where only very limited data is
available. This opens even further practical control applications in real world
problems where data is restricted.
( 2
min )
Recent advances in large language models have led to renewed interest in
natural language processing in healthcare using the free text of clinical
notes. One distinguishing characteristic of clinical notes is their long time
span over multiple long documents. The unique structure of clinical notes
creates a new design choice: when the context length for a language model
predictor is limited, which part of clinical notes should we choose as the
input? Existing studies either choose the inputs with domain knowledge or
simply truncate them. We propose a framework to analyze the sections with high
predictive power. Using MIMIC-III, we show that: 1) predictive power
distribution is different between nursing notes and discharge notes and 2)
combining different types of notes could improve performance when the context
length is large. Our findings suggest that a carefully selected sampling
function could enable more efficient information extraction from clinical
notes.
( 2
min )
We propose a novel task-agnostic in-domain pre-training method that sits
between generic pre-training and fine-tuning. Our approach selectively masks
in-domain keywords, i.e., words that provide a compact representation of the
target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We
evaluate our approach using six different settings: three datasets combined
with two distinct pre-trained language models (PLMs). Our results reveal that
the fine-tuned PLMs adapted using our in-domain pre-training strategy
outperform PLMs that used in-domain pre-training with random masking as well as
those that followed the common pre-train-then-fine-tune paradigm. Further, the
overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the
pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
( 2
min )
Understanding how the statistical and geometric properties of neural activity
relate to performance is a key problem in theoretical neuroscience and deep
learning. Here, we calculate how correlations between object representations
affect the capacity, a measure of linear separability. We show that for
spherical object manifolds, introducing correlations between centroids
effectively pushes the spheres closer together, while introducing correlations
between the axes effectively shrinks their radii, revealing a duality between
correlations and geometry with respect to the problem of classification. We
then apply our results to accurately estimate the capacity of deep network
data.
( 2
min )
We analyze statistical discrimination in hiring markets using a multi-armed
bandit model. Myopic firms face workers arriving with heterogeneous observable
characteristics. The association between the worker's skill and characteristics
is unknown ex ante; thus, firms need to learn it. Laissez-faire causes
perpetual underestimation: minority workers are rarely hired, and therefore,
the underestimation tends to persist. Even a marginal imbalance in the
population ratio frequently results in perpetual underestimation. We propose
two policy solutions: a novel subsidy rule (the hybrid mechanism) and the
Rooney Rule. Our results indicate that temporary affirmative actions
effectively alleviate discrimination stemming from insufficient data.
( 2
min )
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing
for a previous task leverage, named the source, into a new one, the target,
without requiring access to the source data. Indeed, HTL relies only on a
hypothesis learnt from such source data, relieving the hurdle of expansive data
storage and providing great practical benefits. Hence, HTL is highly beneficial
for real-world applications relying on big data. The analysis of such a method
from a theoretical perspective faces multiple challenges, particularly in
classification tasks. This paper deals with this problem by studying the
learning theory of HTL through algorithmic stability, an attractive theoretical
framework for machine learning algorithms analysis. In particular, we are
interested in the statistical behaviour of the regularized empirical risk
minimizers in the case of binary classification. Our stability analysis
provides learning guarantees under mild assumptions. Consequently, we derive
several complexity-free generalization bounds for essential statistical
quantities like the training error, the excess risk and cross-validation
estimates. These refined bounds allow understanding the benefits of transfer
learning and comparing the behaviour of standard losses in different scenarios,
leading to valuable insights for practitioners.
( 2
min )
Understanding the implicit regularization imposed by neural network
architectures and gradient based optimization methods is a key challenge in
deep learning and AI. In this work we provide sharp results for the implicit
regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs)
in the over-parameterized regression setting and, potentially surprisingly,
link this to the phenomenon of phase transitions in generalized hardness of
approximation (GHA). GHA generalizes the phenomenon of hardness of
approximation from computer science to, among others, continuous and robust
optimization. It is well-known that the $\ell^1$-norm of the gradient flow of
DLNs with tiny initialization converges to the objective function of basis
pursuit. We improve upon these results by showing that the gradient flow of
DLNs with tiny initialization approximates minimizers of the basis pursuit
optimization problem (as opposed to just the objective function), and we obtain
new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness
of our results would imply that the GHA phenomenon would not occur for the
basis pursuit optimization problem -- which is a contradiction -- thus implying
sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the
basis pursuit problem is chosen by the gradient flow whenever the minimizer is
not unique. Interestingly, this depends on the depth of the DLN.
( 3
min )
We present APAC-Net, an alternating population and agent control neural
network for solving stochastic mean field games (MFGs). Our algorithm is geared
toward high-dimensional instances of MFGs that are beyond reach with existing
solution methods. We achieve this in two steps. First, we take advantage of the
underlying variational primal-dual structure that MFGs exhibit and phrase it as
a convex-concave saddle point problem. Second, we parameterize the value and
density functions by two neural networks, respectively. By phrasing the problem
in this manner, solving the MFG can be interpreted as a special case of
training a generative adversarial network (GAN). We show the potential of our
method on up to 100-dimensional MFG problems.
( 2
min )
In this paper, we investigate the impact of numerical instability on the
reliability of sampling, density evaluation, and evidence lower bound (ELBO)
estimation in variational flows. We first empirically demonstrate that common
flows can exhibit a catastrophic accumulation of error: the numerical flow map
deviates significantly from the exact map -- which affects sampling -- and the
numerical inverse flow map does not accurately recover the initial input --
which affects density and ELBO computations. Surprisingly though, we find that
results produced by flows are often accurate enough for applications despite
the presence of serious numerical instability. In this work, we treat
variational flows as dynamical systems, and leverage shadowing theory to
elucidate this behavior via theoretical guarantees on the error of sampling,
density evaluation, and ELBO estimation. Finally, we develop and empirically
test a diagnostic procedure that can be used to validate results produced by
numerically unstable flows in practice.
( 2
min )
Large-language models (LLMs) such as GPT-4 caught the interest of many
scientists. Recent studies suggested that these models could be useful in
chemistry and materials science. To explore these possibilities, we organized a
hackathon.
This article chronicles the projects built as part of this hackathon.
Participants employed LLMs for various applications, including predicting
properties of molecules and materials, designing novel interfaces for tools,
extracting knowledge from unstructured data, and developing new educational
applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines.
( 3
min )
PIGINet leverages machine learning to streamline and enhance household robots' task and motion planning, by assessing and filtering feasible solutions in complex environments.
( 9
min )
A new report by MIT researchers highlights the potential of generative AI to help workers with certain writing assignments.
( 9
min )
We study the adaption of soft actor-critic (SAC) from continuous action space
to discrete action space. We revisit vanilla SAC and provide an in-depth
understanding of its Q value underestimation and performance instability issues
when applied to discrete settings. We thereby propose entropy-penalty and
double average Q-learning with Q-clip to address these issues. Extensive
experiments on typical benchmarks with discrete action space, including Atari
games and a large-scale MOBA game, show the efficacy of our proposed method.
Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.
( 2
min )
The coupling of deep reinforcement learning to numerical flow control
problems has recently received a considerable attention, leading to
groundbreaking results and opening new perspectives for the domain. Due to the
usually high computational cost of fluid dynamics solvers, the use of parallel
environments during the learning process represents an essential ingredient to
attain efficient control in a reasonable time. Yet, most of the deep
reinforcement learning literature for flow control relies on on-policy
algorithms, for which the massively parallel transition collection may break
theoretical assumptions and lead to suboptimal control models. To overcome this
issue, we propose a parallelism pattern relying on partial-trajectory buffers
terminated by a return bootstrapping step, allowing a flexible use of parallel
environments while preserving the on-policiness of the updates. This approach
is illustrated on a CPU-intensive continuous flow control problem from the
literature.
( 2
min )
When measuring rare processes at Belle II, a huge luminosity is required,
which means a large number of simulations are necessary to determine signal
efficiencies and background contributions. However, this process demands high
computation costs while most of the simulated data, in particular in case of
background, are discarded by the event selection. Thus, filters using graph
neural networks are introduced at an early stage to save the resources for the
detector simulation and reconstruction of events discarded at analysis level.
In our work, we improved the performance of the filters using graph attention
and investigated statistical methods including sampling and reweighting to deal
with the biases introduced by the filtering.
( 2
min )
For prediction of clustered time-to-event data, we propose a new deep neural
network based gamma frailty model (DNN-FM). An advantage of the proposed model
is that the joint maximization of the new h-likelihood provides maximum
likelihood estimators for fixed parameters and best unbiased predictors for
random frailties. Thus, the proposed DNN-FM is trained by using a negative
profiled h-likelihood as a loss function, constructed by profiling out the
non-parametric baseline hazard. Experimental studies show that the proposed
method enhances the prediction performance of the existing methods. A real data
analysis shows that the inclusion of subject-specific frailties helps to
improve prediction of the DNN based Cox model (DNN-Cox).
( 2
min )
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient
training algorithms designed to improve training, validation, and downstream
performance faster than standard training. In this work, we revisit three
categories of such algorithms: dynamic architectures (layer stacking, layer
dropping), batch selection (selective backprop, RHO loss), and efficient
optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed
computation budget using such methods, we find that their training, validation,
and downstream gains vanish compared to a baseline with a fully-decayed
learning rate. We define an evaluation protocol that enables computation to be
done on arbitrary machines by mapping all computation time to a reference
machine which we call reference system time. We discuss the limitations of our
proposed protocol and release our code to encourage rigorous research in
efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.
( 2
min )
Recently Chen and Poor initiated the study of learning mixtures of linear
dynamical systems. While linear dynamical systems already have wide-ranging
applications in modeling time-series data, using mixture models can lead to a
better fit or even a richer understanding of underlying subpopulations
represented in the data. In this work we give a new approach to learning
mixtures of linear dynamical systems that is based on tensor decompositions. As
a result, our algorithm succeeds without strong separation conditions on the
components, and can be used to compete with the Bayes optimal clustering of the
trajectories. Moreover our algorithm works in the challenging
partially-observed setting. Our starting point is the simple but powerful
observation that the classic Ho-Kalman algorithm is a close relative of modern
tensor decomposition methods for learning latent variable models. This gives us
a playbook for how to extend it to work with more complicated generative
models.
( 2
min )
For prediction of clustered time-to-event data, we propose a new deep neural
network based gamma frailty model (DNN-FM). An advantage of the proposed model
is that the joint maximization of the new h-likelihood provides maximum
likelihood estimators for fixed parameters and best unbiased predictors for
random frailties. Thus, the proposed DNN-FM is trained by using a negative
profiled h-likelihood as a loss function, constructed by profiling out the
non-parametric baseline hazard. Experimental studies show that the proposed
method enhances the prediction performance of the existing methods. A real data
analysis shows that the inclusion of subject-specific frailties helps to
improve prediction of the DNN based Cox model (DNN-Cox).
( 2
min )
We investigate a framework for binary image denoising via restricted
Boltzmann machines (RBMs) that introduces a denoising objective in quadratic
unconstrained binary optimization (QUBO) form and is well-suited for quantum
annealing. The denoising objective is attained by balancing the distribution
learned by a trained RBM with a penalty term for derivations from the noisy
image. We derive the statistically optimal choice of the penalty parameter
assuming the target distribution has been well-approximated, and further
suggest an empirically supported modification to make the method robust to that
idealistic assumption. We also show under additional assumptions that the
denoised images attained by our method are, in expectation, strictly closer to
the noise-free images than the noisy images are. While we frame the model as an
image denoising model, it can be applied to any binary data. As the QUBO
formulation is well-suited for implementation on quantum annealers, we test the
model on a D-Wave Advantage machine, and also test on data too large for
current quantum annealers by approximating QUBO solutions through classical
heuristics.
( 2
min )
We consider stochastic optimization problems where data is drawn from a
Markov chain. Existing methods for this setting crucially rely on knowing the
mixing time of the chain, which in real-world applications is usually unknown.
We propose the first optimization method that does not require the knowledge of
the mixing time, yet obtains the optimal asymptotic convergence rate when
applied to convex problems. We further show that our approach can be extended
to: (i) finding stationary points in non-convex optimization with Markovian
data, and (ii) obtaining better dependence on the mixing time in temporal
difference (TD) learning; in both cases, our method is completely oblivious to
the mixing time. Our method relies on a novel combination of multi-level Monte
Carlo (MLMC) gradient estimation together with an adaptive learning method.
( 2
min )
We propose a goodness-of-fit measure for probability densities modeling
observations with varying dimensionality, such as text documents of differing
lengths or variable-length sequences. The proposed measure is an instance of
the kernel Stein discrepancy (KSD), which has been used to construct
goodness-of-fit tests for unnormalized densities. The KSD is defined by its
Stein operator: current operators used in testing apply to fixed-dimensional
spaces. As our main contribution, we extend the KSD to the variable-dimension
setting by identifying appropriate Stein operators, and propose a novel KSD
goodness-of-fit test. As with the previous variants, the proposed KSD does not
require the density to be normalized, allowing the evaluation of a large class
of models. Our test is shown to perform well in practice on discrete sequential
data benchmarks.
( 2
min )
Recently Chen and Poor initiated the study of learning mixtures of linear
dynamical systems. While linear dynamical systems already have wide-ranging
applications in modeling time-series data, using mixture models can lead to a
better fit or even a richer understanding of underlying subpopulations
represented in the data. In this work we give a new approach to learning
mixtures of linear dynamical systems that is based on tensor decompositions. As
a result, our algorithm succeeds without strong separation conditions on the
components, and can be used to compete with the Bayes optimal clustering of the
trajectories. Moreover our algorithm works in the challenging
partially-observed setting. Our starting point is the simple but powerful
observation that the classic Ho-Kalman algorithm is a close relative of modern
tensor decomposition methods for learning latent variable models. This gives us
a playbook for how to extend it to work with more complicated generative
models.
( 2
min )
Stochastic Gradient Descent (SGD) is one of the simplest and most popular
algorithms in modern statistical and machine learning due to its computational
and memory efficiency. Various averaging schemes have been proposed to
accelerate the convergence of SGD in different settings. In this paper, we
explore a general averaging scheme for SGD. Specifically, we establish the
asymptotic normality of a broad range of weighted averaged SGD solutions and
provide asymptotically valid online inference approaches. Furthermore, we
propose an adaptive averaging scheme that exhibits both optimal statistical
rate and favorable non-asymptotic convergence, drawing insights from the
optimal weight for the linear model in terms of non-asymptotic mean squared
error (MSE).
( 2
min )
Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that […]
( 11
min )
A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers Read article >
( 11
min )
Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today. The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device. From Dusk Till Pawn Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a Read article >
( 5
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )